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LONG-TERM  GOALS 

This  project  is  intended  to  advance  the  state  of  passive  acoustic  monitoring.  Improved  methods  of 
identifying  cetaceans  are  developed  in  order  to  contribute  to  the  Navy’s  mitigation  efforts. 

APPROACH 

This  project  is  a  multi-pronged  study  to  advance  the  state  of  the  field  in  three  areas.  The  development 
of  automated  auditory  scene  analysis  for  delphinid  tonal  calls  will  permit  subsequent  work  by  this 
investigator  or  others  to  exploit  the  use  of  whistles  for  classification  and  localization.  Our  approach  is 
to  dynamically  build  hypothesis  graphs  using  phase-frequency  representations  of  the  signal.  In  parallel 
to  this  effort,  two  modeling  techniques  are  being  pursued  to  improve  existing  passive  acoustic 
monitoring  capabilities  based  on  echolocation  clicks  of  odontocetes.  The  first  of  these  examines  the 
use  of  a  universal  background  model  as  proposed  by  Reynolds  et  al.  (2000)  for  human  speaker 
verification  tasks.  Reynolds’  problem,  which  is  similar  in  nature  to  ours,  is  how  can  one  reject 
observations  from  a  speaker  (or  dolphin  species)  for  which  there  is  no  data  to  create  a  model.  We 
adapt  his  idea  of  a  universal  background  model  by  training  a  generalized  odontocete  model  using  the 
data  of  a  number  of  species.  This  model  is  not  specific  to  any  one  species.  Using  Bayesian  learning, 
training  data  from  a  specific  species  adapts  the  parameters  of  the  generalized  model,  thus  serving  as  a 
foil  against  vocalizations  that  sound  similar  to  one  of  our  species.  The  second  approach  for 
echolocation  clicks  exploits  recent  machine  learning  work  on  submanifold  learning  (Dasgupta  and 
Freund,  2007;  Dasgupta  and  Freund,  2008;  Freund  et  al.,  2007).  In  order  to  detect  and  classify 
odontocetes,  features,  or  poignant  characteristics  of  their  signals,  must  be  extracted  from  the  audio 
signal.  As  the  underlying  process  of  sound  generation  cannot  be  measured  directly,  nor  is  it  well 
understood,  classification  techniques  must  attempt  to  infer  information  about  the  producer  of  the  signal 
(e.g.  species)  through  a  typically  higher-order  set  of  features.  Submanifold  learners  focus  on  learning  a 
subspace  of  the  high-order  feature  space  that  can  be  more  conducive  to  providing  robust  classification. 

WORK  COMPLETED 

The  majority  of  the  whistle  extraction  system  has  been  implemented  and  we  are  completing  scoring 
tools  to  evaluate  the  system.  A  framework  for  the  universal  background  model  detection  system  has 
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been  completed  and  evaluated  using  five  species  of  delphinids  from  the  Southern  California  Bight.  We 
have  created  a  framework  for  using  the  random  projection  tree  submanifold  learner  and  have  extended 
an  implementation  of  the  algorithm  provided  by  Freund  and  Dasgupta  to  provide  pruning  capabilities, 
a  necessary  component  for  tree-based  classifiers  which  have  a  tendency  to  overleam  when  not  pruned. 

RESULTS 


Preliminary  qualitative  evaluations  of  the  whistle  contour  extractor  have  been  completed  on  the 
following  species:  bottlenose  dolphins  ( Tursiops  truncatus),  melon-headed  dolphins  ( Peponocephela 
electro ),  and  long-beaked  common  dolphins  (Delphinus  capensis )  and  presented  at  conferences. 

Figure  1  shows  a  spectrogram  containing  many  whistles  and  clicks  and  shows  along  with  the  detected 
whistles.  Annotation  is  under  way  for  ground  truth  information,  although  a  recent  conversation  with 
Shannon  Rankin  (NOAA/NMFS)  may  open  a  better  path  to  verification  which  will  be  investigated  in 
the  coming  weeks.  Informal  tests  show  that  the  current  methods  are  effective  on  signals  with 
significant  acoustic  clutter  in  the  auditory  scene  with  the  exception  of  clutter  due  to  burst  pulses  which 
is  a  topic  of  continued  research. 


Figure  1  (color  online)  -  Whistles  extracted  from  long  beaked  common  dolphin  (Delphinus 
capensis)  whistles  with  a  threshold  of  10  dB  re  counts2/Hz.  The  upper  drawing  shows  a  thresholded 
spectrogram  with  all  time  x  frequency  bins  under  10  dB  set  to  0  and  the  lower  figure  shows  the 
unthresholded  spectrogram  with  the  detected  whistles.  Common  dolphins  aggregate  in  large  groups 

and  typically  have  many  overlapping  whistles. 
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False  Alarm  probability  (in  %) 


Figure  2  (color  online)  -  Detection  error  tradeoff  curves  for  a  species  detection  task  when  the 
impostor  species  has  not  been  seen  in  the  training  data.  A  circle  shows  the  point  on  each  curve 
where  the  miss  and  false  alarm  probabilities  are  equal  with  the  associated  error  rate  listed  in  the 
legend.  The  left  graph  shows  the  performance  when  Gaussian  mixture  models  are  trained  for  each 
of  the  five  species  except  the  one  being  tested  and  the  decision  is  based  on  a  likelihood  ratio  of  the 
targeted  species  to  the  maximum  impostor  species  is  used.  The  right  graph  shows  performance 
when  a  Gaussian  mixture  model  is  created  by  Bayesian  adaptation  of  a  universal  background  model 
trained  from  all  species  except  the  species  being  tested  (the  targeted  species  and  the  species 
associated  with  impostor  calls).  Curves  that  are  closer  to  the  origin  have  better  performance. 


Experiments  from  the  universal  background  model  are  for  the  moment  inconclusive.  We  ask  the 
question  of  whether  or  not  a  set  of  100  consecutive  echo  location  clicks  were  produced  by  a  specific 
species.  All  feature  data  from  each  sighting  are  randomly  assigned  to  one  of  three  partitions  and  a 
three-fold  cross  validation  experiment  is  run  100  times.  Within  each  fold,  the  hypothesis  is  tested 
against  test  data  from  a  specific  species  and  one  of  the  other  species.  For  each  of  the  impostor  species, 
no  training  data  from  that  species  is  used  during  model  creation.  In  the  context  of  a  baseline  Gaussian 
mixture  model  system  based  on  our  previous  work  (Roch  et  al.,  2008),  this  simply  means  that  the 
model  for  that  species  is  not  used.  For  the  universal  background  model,  a  background  model  is  trained 
using  data  from  species  other  than  the  two  being  tested  and  a  species  specific  model  is  created  by 
adapting  the  background  model’s  means  (Reynolds  et  al.,  2000).  Figure  2  shows  a  pair  of  detection 
error  tradeoff  (DET)  curves  (Martin  et  al.,  1997)  which  are  similar  to  receiver  operating  characteristic 
curves  but  scale  the  axes  based  on  normal  deviates.  The  left  DET  plot  shows  performance  for 
Gaussian  mixture  models  and  the  right  DET  plot  is  for  the  universal  background  model.  While  the 
universal  background  model  does  improve  performance  for  some  species,  it  is  at  the  expense  of  poorer 
performance  for  others  and  cannot  be  said  to  represent  an  improved  technique  at  this  time. 

We  hypothesize  that  some  of  the  difficulty  may  come  from  working  with  a  limited  number  of  species. 
The  three  species  of  odontocetes  are  likely  to  be  insufficient  to  characterize  a  general  odontocete 
model.  We  plan  on  supplementing  this  data  with  additional  data  to  test  this  hypothesis.  Investigation 
of  the  problematic  classification  cases  has  led  us  to  visualize  echolocation  clicks  through  click  spectra 
that  are  sorted  by  peak  frequency  (see  Figure  3).  Analysis  of  these  plots  revealed  several  trends  that 
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had  escaped  observation  by  a  trained  analyst.  A  couple  of  our  sightings  such  as  the  one  shown  had 
echo  sounder  pings  that  were  admitted  into  the  analysis  by  our  click  detector.  When  the  spectra  are 
visualized  in  this  manner,  many  types  of  anomalies  in  the  data  set  become  easily  detectable.  In 
addition  to  the  echo  sounders,  we  observed  a  number  of  spectra  with  very  low  and  very  high  peak 
frequencies,  some  of  which  appear  to  be  clipped.  While  this  does  not  affect  background  model 
identification  any  more  than  our  baseline  Gaussian  mixture  models,  revisions  to  our  feature  extraction 
algorithm  to  address  this  results  in  over  a  15%  reduction  in  error  rate  in  a  species  identification  task  on 
five  Southern  California  odontocetes  when  compared  to  our  previous  method  of  feature  extraction 
(Roch  et  al.,  2008)  on  the  same  data.  This  reduction  is  a  significant  contribution  to  the  research. 
Results  of  the  species  identification  task  which  had  a  mean  error  rate  of  28%  using  a  similar 
randomized  experiment  are  shown  in  Figure  5. 


Figure  3  (color  online)  -  Spectra  of  over  13000  impulsive  events  recorded  in  the  presence  of  a 
November  14,  2004  sighting  of  long-beaked  common  dolphins  (Delphinus  capensis)  and  sorted  by 
peak  frequency.  Within  each  peak  frequency,  events  are  sorted  by  peak  energy.  Harmonics  of  an 
echo  sounder  becomes  easy  to  detect  at  28  and  56  kHz.  The  effects  of  D.  capensis ’s  orientation  with 
respect  to  the  hydrophone  are  clearly  visible,  with  presumed  on  axis  echolocation  clicks  being  more 
to  the  right  although  other  factors  such  as  slant  distance  to  the  hydrophone  can  have  profound 
effects.  This  signal  variability  contributes  significantly  to  making  species  identification  a 

challenging  problem. 


he  final  project  sponsored  by  this  work  is  the  random  projection  tree  submanifold  learning  algorithm 
(RP-Tree).  This  project  is  theoretically  the  most  complex  of  the  three  projects  and  currently  the  least 
advanced  (as  planned  in  our  schedule).  We  have  integrated  Dasgupta  and  Freund’s  learner  into  our 
test  framework  and  implemented  tree  pruning  methods  proposed  by  Quinlan  (1993)  for  his  influential 
C4.5  tree  classifier.  These  additions  were  completed  in  the  late  spring  early  summer  and  we  have 
begun  to  analyze  experiments  to  determine  how  the  model  should  be  refined.  Using  the  randomized 
cross-validation  methods  described  earlier,  we  have  trained  RP-Trees  on  the  same  Southern  California 
odontocete  classification  task.  The  overall  error  rate  is  37.9%,  which  is  significantly  higher  than  that 
of  the  Gaussian  mixture  model,  but  for  preliminary  experiments  these  results  are  not  unreasonable 
(histogram  not  shown  due  to  space  constraints).  Our  current  strategy  is  to  examine  the 
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misclassifications  and  determine  whether  or  not  the  tree  pruning  method  is  suitable  and  to  develop  an 
alternative  pruning  method  based  upon  our  observations. 


am  M=fooL  |F=a277o=aiii 


Figure  4  -  Error  distribution  for  100  randomized  Gaussian  mixture  model  experiments  using  our 
new  feature  extraction  algorithm  with  threefold  cross  validation  on  a  species  identification  task  for 
five  Southern  California  odontocetes:  bottlenose  dolphin,  long  and  short-beaked  common 
dolphins,  Pacific  white-sided  dolphin  and Risso’s  dolphin.  Mean  overall  error  rate:  28%.  The 
previous  feature  extraction  method  had  a  mean  error  rate  of  33%. 


IMPACT/APPLICATIONS 

This  work  can  be  used  in  passive  acoustic  monitoring  platforms  for  mitigation  and  science. 
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